BigSparse: High-performance external graph analytics
نویسندگان
چکیده
We present BigSparse, a fully external graph analytics system that picks up where semi-external systems like FlashGraph and X-Stream, which only store vertex data in memory, left off. BigSparse stores both edge and vertex data in an array of SSDs and avoids random updates to the vertex data, by first logging the vertex updates and then sorting the log to sequentialize accesses to the SSDs. This newly introduced sorting overhead is reduced significantly by interleaving sorting with vertex reduction operations. In our experiments on a server with 32GB to 64GB of DRAM, BigSparse outperforms other in-memory and semi-external graph analytics systems for algorithms such as PageRank, BreadthFirst Search, and Betweenness-Centrality for terabyte-size graphs with billions of vertices. BigSparse is capable of highspeed analytics of much larger graphs, on the same machine configuration.
منابع مشابه
The impact of interwoven integration practices on supply chain value addition and firm performance
Drawing on the supply chain (SC) management literature, this article conceptualizes and empirically tests a framework that shows how both external and internal integration practices are significant and positively associated with SC value addition and firm performance. The framework also tests the impact of value addition as a reinforcing factor on firm performance. The outcome of this investiga...
متن کاملMOCgraph: Scalable Distributed Graph Processing Using Message Online Computing
Existing distributed graph processing frameworks, e.g., Pregel, Giraph, GPS and GraphLab, mainly exploit main memory to support flexible graph operations for efficiency. Due to the complexity of graph analytics, huge memory space is required especially for those graph analytics that spawn large intermediate results. Existing frameworks may terminate abnormally or degrade performance seriously w...
متن کاملGraphMat: High performance graph analytics made productive
Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix ope...
متن کاملGraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster ...
متن کاملExplore Efficient Data Organization for Large Scale Graph Analytics and Storage
Many Big Data analytics essentially explore the relationship among interconnected entities, which are naturally represented as graphs. However, due to the irregular data access patterns in the graph computations, it remains a fundamental challenge to deliver highly efficient solutions for large scale graph analytics. Such inefficiency restricts the utilization of many graph algorithms in Big Da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1710.07736 شماره
صفحات -
تاریخ انتشار 2017